-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Bunny.net object storage support #4325
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
I know about the pending implementation of S3 for bunny storage. I would say this is a similar situation to B2 with their native API and their S3 compatible API. I think it might be useful to have their native API supported for two reasons:
They advertise extremely low TTFB latency. I think it might be a good idea to have their native API implemented to compare to S3 when they offer it. Bunny's performance advantage could be caused partially by their own API which would justify having a bespoke implementation. |
S3 is "coming soon" since 2022... 💭 So API is the better option (also because there are no egress fees using the API). |
@l0wl3vel regarding this, if this can simplify things, is the external library |
Maybe, but I do not think that is a good idea to basically put an API SDK inside of a specialized codebase the size of JuiceFS. I thought about forking/redoing the @solracsf Would you be interested in building new bunnystorage-go library using more standard dependencies with me? If the JuiceFS maintainers are fine with pulling in the jamesponddotco/bunnystorage-go dependency for now I would prefer it to just get it merged and working on replacing the API SDK with another implementation later on. |
@l0wl3vel I'm not a Go developer, I'm afraid i can't help so much here 😿 |
We want to merge a stable code. And one that is useful to users. |
Would you consider merging the code in its current form? And if not, what changes would be required to consider it for merging? What kind of stability are we talking about? Upgrade Stability (Users can upgrade JuiceFS without need for migration), functional stability, dependency stability? Are there guidelines on what you would consider stable and what assurances are necessary to proof a certain kind of stability? |
@l0wl3vel Let me know what you need changed in the base module and I can see if I can find the time to work on it. At the very least, I'll remove the need for |
@jamesponddotco Hey, really appreciating you chiming in here. Thanks for writing bunnystorage-go. Here are a few things I think would improve it.
I think this is getting a bit off-topic here, but I already started converting bunnystorage-go to resty on my fork. Uploading does not work properly, unfortunately. Do you think you could move the main bunnystorage-go repo to github? No offense, I would really like to remove the dependency on sr.ht and your xstd-go entirely to make the module it more idiomatic and accessible. |
@l0wl3vel, thank you for the feedback, it's invaluable!
I mostly like to keep configuration in its own struct, hence why they're separate. This is personal preference, really, so I don't see it changing. If you look at other clients I wrote, like the
That was a mistake indeed, but fixing it would require a breaking change that I'm not sure I'm willing to make right now. I'll consider it, though. Same goes for that logging interface. I should probably have gone with something that would allow me to use
Using errors as constants is also a personal preference, but there is some reason behind it:
Another mistake that would require a breaking change to undo, one that I make on every client, really. Not sure if this will be undone, though.
Not happening, sorry. I much prefer the email workflow for working with code and git in general, and prefer SourceHut to GitHub as a platform for several reasons. I still use GitHub as a read-only mirror and would probably accept PRs over here since I can't disable them, but that is as far as I'll go with the platform. As for the usage of I'll look into those changes this week, but I assume you'll use your fork instead for this PR. Still, glad my code could be of some help, haha. Since this is indeed becoming off-topic, reach out to me at [james at cipher dot host] if you want to continue the conversation. |
Ok, as I see it, the plan is the following:
Does anyone have anything to add? |
|
I will review this pr again after the Spring Festival. |
previously the path parsed out of list responses did not contain the whole path. This commit allows juicefs delete, sync etc. to work properly
Required some fixes around error masking due to changed errors
Updated the bunny-storage-go-sdk to 0.0.10, which uses fasthttp instead of net/http and broke 1GiB/s in write speed and got some nice boost in read speed as well:
|
// Deleting keys that do not exist are expected to not throw an error | ||
func (b *bunnyClient) Delete(key string) error { | ||
if err := b.client.Delete(key, false); err != nil { | ||
if err.Error() == "Not Found" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is confusing, we should only retry it when the key is ending with /
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or call b.client.Delete(key, strings.HasSuffix("/"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workaround is necessary to pass the tests. The boolean flag on b.client.Delete()
only appends a / character to the path, which means that setting it at all should not be necessary.
Tests fail on repeated test runs with the following implementation:
func (b *bunnyClient) Delete(key string) error {
if err := b.client.Delete(key, false); err != nil {
if err.Error() == "Not Found" {
return nil
} else {
return err
}
}
return nil
}
Well, apparently some tests do not consistently pass paths with a trailing /. The first run works fine, but on any subsequent run a list test fails because there are empty directories left over from the first test run.
First run
=== RUN TestBunny
object_storage_test.go:131: PUT testEncodeFile failed: Unauthorized
object_storage_test.go:180: out-of-range get: '', but got 0, error: Requested Range Not Satisfiable
object_storage_test.go:284: list with delimiter is not supported
object_storage_test.go:326: list with delimiter is not supported
object_storage_test.go:527: https://storage.bunnycdn.com/juicefs-testunit-test/ does not support multipart upload: not supported
--- PASS: TestBunny (10.97s)
PASS
Object storage directory after the first run
Second Run
go1.20.13 test -v -tags gluster -run TestBunny github.com/juicedata/juicefs/pkg/object
=== RUN TestBunny
object_storage_test.go:131: PUT testEncodeFile failed: Unauthorized
object_storage_test.go:180: out-of-range get: '', but got 0, error: Requested Range Not Satisfiable
object_storage_test.go:222: First key should be test, but got a/
--- FAIL: TestBunny (0.65s)
FAIL
FAIL github.com/juicedata/juicefs/pkg/object 1.020s
FAIL
This workaround seems to be necessary, since it is not guaranteed that JuiceFS passes in the trailing /.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested it after replacing ListAll with List and the tests fail again. With or without my deletion workaround there are objects left in the unit-test directory. Subsequent runs will always fail again. Is it possible that testStorage does not handle the cleanup of the unit-test directory on object stores, that allow prefixes containing only empty directories? I would propose cleaning up the root directory of the storage prefix like this:
diff --git a/pkg/object/object_storage_test.go b/pkg/object/object_storage_test.go
index cfab5b97..a806c4ef 100644
--- a/pkg/object/object_storage_test.go
+++ b/pkg/object/object_storage_test.go
@@ -120,6 +120,7 @@ func testStorage(t *testing.T, s ObjectStorage) {
}
prefix := "unit-test/"
s = WithPrefix(s, prefix)
+ defer s.Delete("/")
defer func() {
if err := s.Delete("test"); err != nil {
t.Fatalf("delete failed: %s", err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this problem only appears on Object Storage providers that do support the concept of an empty directory, which S3 style APIs do not support. We would need to adjust the tests to clean up the unit-test directory properly.
pkg/object/bunny.go
Outdated
|
||
//listInParentDirectory := !strings.HasSuffix(prefix, "/") | ||
|
||
listedObjects, err := b.client.List(path.Dir(prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take the file.go as an example
@l0wl3vel Hi, it seems there's still an issue relating to the 'list' method that is not solved yet. Are you still working on this? |
@SandyXSD I still want to finish it. I already have the changes implemented, but I can´t make sure it works, since Bunny Object Storage has the concept of empty directories, which trips up the test suite when running it multiple times, because there are empty directories left from the previous run. I just pushed my changes. |
} | ||
} else if marker == "" { // If Directory && no marker: Return prefix directory as well | ||
parentPath := path.Dir(path.Dir(prefix)) | ||
objects, err := b.client.List(parentPath+dirSuffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just need to add the directory itself here, will list it later. Please follow the example in file.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what changes you would like to see. The code here is based on sftp.go, like you suggested before. Could you elaborate please on what about it is incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is how sftp implements List:
var objs []Object
dir := f.path(prefix)
if !strings.HasSuffix(dir, "/") {
dir = filepath.Dir(dir)
if !strings.HasSuffix(dir, dirSuffix) {
dir += dirSuffix
}
} else if marker == "" {
obj, err := f.Head(prefix)
if err != nil {
if os.IsNotExist(err) {
return nil, nil
}
return nil, err
}
objs = append(objs, obj)
}
infos, err := c.sftpClient.ReadDir(dir)
if err != nil {
if os.IsPermission(err) {
logger.Warnf("skip %s: %s", dir, err)
return nil, nil
}
if os.IsNotExist(err) {
return nil, nil
}
return nil, err
}
entries := f.sortByName(c.sftpClient, dir, infos, followLink)
for _, o := range entries {
key := o.Key()
if !strings.HasPrefix(key, prefix) || (marker != "" && key <= marker) {
continue
}
objs = append(objs, o)
if len(objs) == int(limit) {
break
}
}
When the marker is ''
, it try to return the prefix self (could be a directory). So instead of list the parent of prefix, can we just call Head() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but listing a prefix in S3 does not only return the virtual prefix "directory", rather all keys that start with that prefix. This would mean that I implemented the correct behavior. Working on this is also really difficult, since the tests for object storage are very incomplete and miss a lot of corner cases around listing objects.
@l0wl3vel please rebase to main branch and fix the conflict, I think we are very close to merge this one. |
This one is took over by #4719, thanks very much! |
This PR implements support for Bunny.net object storage.
Fixes #4317